Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README: Debian: extend to specify source packages #57

Merged
merged 3 commits into from
Dec 10, 2021

Conversation

gernot-h
Copy link
Contributor

@gernot-h gernot-h commented Apr 3, 2019

In technical as well as compliance contexts, we often need to refer to Debian sources for a package. Often there is no trivial mapping between source and binary packages (several binaries built from one source,
names as well as versions can differ between binary and source packages!), so to avoid confusion, package-urls shall allow to explicitely specify source packages. After some internal discussion, we
think that an extra qualifier "type" is needed - which shall prevail the "arch" qualifier.

@gernot-h gernot-h changed the title README: Debian: extend to specify source packages [WIP] README: Debian: extend to specify source packages Apr 9, 2019
@gernot-h
Copy link
Contributor Author

gernot-h commented Apr 9, 2019

Please don't merge yet - @Silvanoc reminded me about architecture-independent packages, so falling back to source if "arch" is missing is likely a bad idea. I'll update the MR soon.

@gernot-h gernot-h changed the title [WIP] README: Debian: extend to specify source packages README: Debian: extend to specify source packages Apr 9, 2019
@gernot-h
Copy link
Contributor Author

gernot-h commented Apr 9, 2019

New version pushed, now suggesting a "type" qualifier. From my side ready for merging. :-)

@bufferoverflow
Copy link

@sschuberth Could you review this please?

bufferoverflow
bufferoverflow previously approved these changes May 26, 2019
@sschuberth
Copy link
Member

I'm not a Debian guy, but the proposed changes look sensible to me.

@iamwillbar
Copy link
Member

@lamby would you mind taking a look at this PR as a Debian expert?

@sschuberth
Copy link
Member

several binaries built from one source, names as well as versions can differ between binary and source packages!

On a related note, here's a nice post from the Stack Overflow newsletter just about that.

@lamby
Copy link

lamby commented Nov 15, 2019

names as well as versions can differ between binary and source packages

Some quick notes:

  • Names don't differ, they just exist in an entirely different namespace. eg. source package foo can generate bar and baz binary packages, with no obligation to generate a foo binary package. Regarding versions, whilst you can probably convince the underlying tools so that source foo version 1.0 to generate a foo binary version 1337.0, this just does not happen.

Regarding arch dep / arch-indep / source, did you consider having a arch key that was one of either i386,amd64,ppc64el,all,source (nb. the last two are "magic"). That would prevent logic in the spec, although moving it elsewhere naturally.

(Personally, I'm not a fan of a URL returning 200 and then modifying the querystring can make it 404, eg. by changing the arch string, but that's probably already a lost battle with respect to package-url generally)

@gernot-h
Copy link
Contributor Author

gernot-h commented Nov 20, 2019

What I meant saying that "names differ" is that you can't guess from a binary package to a source package by just replacing "arch=amd64" with "arch=source", to use your arch-key suggestion. So you can have "pkg:deb/debian/[email protected]?arch=amd64" while there can't be "pkg:deb/debian/[email protected]?arch=source" - this would be "pkg:deb/debian/[email protected]?arch=source" instead.

Regarding version deviations between source and binary packages, this for sure happens. One prominent example is the "+bN" suffix which is appended for a rebuild of a new binary package from unmodified sources, see https://packages.debian.org/buster/libselinux1: binary version 2.8-1+b1 is built from source version 2.8-1. I also included a "+b1" example in the examples section in my commit. In some other cases, epoch prefixes ("1:" & friends) differ between source and binary packages.

And then, there are even such weird things like lvm2 package where the binary version "2:1.02.155-3" is built from source version "2.03.02-3", see https://packages.debian.org/buster/libdevmapper1.02.1.

Regarding "arch=source" versus "type=source", I'm more or less undetermined. In fact, I even planned to propose "arch=source" at some point, but refrained because it simply sounded wrong as "source" is no "architecture" per definitionem. :)

@lamby
Copy link

lamby commented Nov 20, 2019

weird things like lvm2 package where the binary version "2:1.02.155-3" is built from source version "2.03.02-3"

TIL :)

@gernot-h
Copy link
Contributor Author

gernot-h commented Nov 20, 2019

@iamwillbar @lamby @sschuberth So do you have an opinion about my PR, anything I can improve so it can be merged? Shall I reword it to switch from "type=source" qualifier to (mis)using the "arch=" qualifier as suggested by @lamby? (I personally would prefer something other than "arch=", but I'm not too opinionated here... ;) ).

@lamby
Copy link

lamby commented Nov 20, 2019

Please don't await or otherwise block on my feedback as I'm afraid I won't be able to commit spending a lot of time on this PR :)

@iamwillbar
Copy link
Member

I'd lean towards misusing arch because it's mutually exclusive and this avoids people doing weird things like ?arch=amd64&type=source.

Thanks for the feedback @lamby, very helpful.

@bureado
Copy link

bureado commented Nov 20, 2019

In certain places of APT world, src:name is used instead of name when specificity is needed. For the purposes of your PR, you could also consider pkg:deb-src instead of pkg:deb, and those semantics would also be understood in APT world.

@iamwillbar
Copy link
Member

@bureado that's a really good point, are the semantic differences of a source package vs a binary package significant enough to warrant a different schema, and I think quite possibly. @pombredanne would you be opposed to differentiating source and binary packages this way?

@stevespringett
Copy link
Member

We already have precedence for specifying source packages using ecosystem specific terms.

For example

pkg:maven/org.apache.xmlgraphics/[email protected]?classifier=sources

The above would resolve the source jar rather than the binary.

Introducing ecosystem types specifically for different types of artifacts would greatly complicate things and introduce incompatibilities with existing systems.

@sschuberth
Copy link
Member

Please don't await or otherwise block on my feedback as I'm afraid I won't be able to commit spending a lot of time on this PR :)

Same here.

@sschuberth
Copy link
Member

In certain places of APT world, src:name is used instead of name when specificity is needed. For the purposes of your PR, you could also consider pkg:deb-src instead of pkg:deb, and those semantics would also be understood in APT world.

On a somewhat related note, ClearlyDefined does explicitly distinguish the deb and debsrc types. Maybe it makes sense to reach out to them and ask what their rationale was to do so?

pkg:maven/org.apache.xmlgraphics/[email protected]?classifier=sources
The above would resolve the source jar rather than the binary.
Introducing ecosystem types specifically for different types of artifacts would greatly complicate things and introduce incompatibilities with existing systems.

I'm also against introducing incompatibilities here, but I wonder whether the Java / JAR case is special: The structure of JARs is no different whether it contains bytecode or sourcecode, in both cases the JARs are basically just ZIPs.

But I don't know if that's also true for deb vs debsrc, are both the same file format, but the one just contains binaries whereas the other contains sources?

@bureado
Copy link

bureado commented Nov 21, 2019

But I don't know if that's also true for deb vs debsrc, are both the same file format, but the one just contains binaries whereas the other contains sources?

They are not the same file format. Binary packages are the usual .deb and source packages are usually two or three files referenced from a signed metadata file, .dsc.

Some thoughts with this caveat: I just recently learned about purl from @iamwillbar, and as exciting as all of this looks, I'm still catching up to it. So I'm in the process of taking a quick look at the SDKs and downstream clients to give a more educated response here, so I would hate to block anything. In the meantime...

Generally, usage context hints to whether source or binary package are expected. Source packages can't be installed, so if someone is apt install'ing a pkg:deb or using dpkg -l to get the BOM of a running system then binary packages are implied. If, on the other hand, someone is referring to a pkg:deb in the context of say pbuilder, then source packages can be implied.

But I guess the problem at hand is that purl is context-unaware and has many more use cases. For example, if you're using purl to fetch a file/files from a repo, then the context can't be implied and you need a solution such as src:name or an entire pkg-src "ecosystem superset". I guess that's the problem at hand here, not unlike the problem of browsing packages.ubuntu.com or bugs.debian.org for pkg vs. src:pkg (the hint is necessary in those cases)

On the latter, perhaps purl could allow for scheme:type[-src]/namespace/name@version?qualifiers#subpath with [-src] being optional and treated as an exclusive case in the SDKs, where each ecosystem could optionally implement -src handlers by creating a class just for that case. The differences between this and making it a qualifier as @stevespringett indicates for Maven are:

  1. Not sure how the SDKs and client downstream can more elegantly allow optional source package handling for each ecosystem, if with an idiomatic, optional -src "hint" or with a qualifier.
  2. It would probably be helpful if the qualifier name is the same across all ecosystems, with optional implementation and gracious failure. Perhaps source=yes or something

I guess there could also be things like resolving to homonymous packages of different types, picking the binary by default with some 301 logic, etc., all of which sounds a bit nightmarish (but is in fact the type of situations we face in real life, you asked for "nginx", did you mean "src:nginx"? Anyway, here's the nginx deb.)

Just my 2e-2.

@sschuberth
Copy link
Member

They are not the same file format. Binary packages are the usual .deb and source packages are usually two or three files referenced from a signed metadata file, .dsc.

For me, that's enough of a justification for a dedicated debsrc type. While not made explicitly made clear in the purl spec, I believe same type should imply same format / "protocol", which would not be the case here.

However, I would not specify a dedicated general -src suffix for the type. Otherwise we would need to do the some for -doc and what not for consistency.

The rule I'd propose is:

  • Use the same type if it's the same file format / file extension / mime type / "protocol", otherwise use another type.
  • If it's the same file format / file extension / mime type / "protocol" but with different semantics, clarify that with a qualifier like classifier or packaging (we should probably standardize the name of the qualifier here).

@iamwillbar
Copy link
Member

Thanks @bureado for your input here! I agree with @sschuberth codification of how we should apply this going forward (it would be great to add those rules into the specification to provide guidance to future contributors).

Unless there's objection I'd suggest @gernot-h follow that scheme for this PR.

@stevespringett
Copy link
Member

stevespringett commented Nov 22, 2019

There was literally an issue about Docker/OCI Image formats in which the opposite point was argued. See #68.

The spec does not describe the package format, only how packages are located. Per the first sentence of the spec:

A purl or package URL is an attempt to standardize existing approaches to reliably identify and locate software packages

Considering that the source packages and all the variations of binary packages (e.g. arch) are all located the same way, it doesn't make any sense to provide two PURL types that resolve to the same location. Doing so will likely lead to a lot of changes to the current spec, and potentially to some of the PURL types that we still haven't fully investigated yet.

@iamwillbar
Copy link
Member

I think in #68 the argument was because they were located differently they should have different schemes, I think we effectively deferred the discussion as to whether the format plays a specific role. I think in all of the previously defined types you can infer the format of the package from the scheme because there was effectively a 1:1 mapping (or more accurately *:1 mapping) of schemes to package format. In this case we now have ambiguity where the package is located the same way but the format once you locate it is different, so we should make an explicit decision (and update the spec accordingly) about how we should handle that.

If the spec only cared about identity then I would pretty comfortably say that we should not differentiate scheme based on format because our primary concern is uniqueness. However, if you're trying to locate something then there's a strong possibility that you're wanting to do something with it, and in that case knowledge about what format to expect once you locate it is probably pretty important.

For example, the host and path aspects of http and https are identical because the how to locate are the same, but the scheme is different because the format (protocol) of how you talk to the host once you locate it is different.

@stevespringett
Copy link
Member

If you want to differentiate formats and have different PURL types for each, I'll give you a reason why this would greatly complicate things.

A typical Maven repo for an artifact will consist of:

  • *.jar (including -sources and -javadoc)
  • *.asc
  • *.md5
  • *.sha1
  • *.pom

Here we have a repo containing binaries (which could also be war, apk, etc), pgp signatures, text files, and xml files for a single version of an artifact. It is expected that the consumer of the PURL know how to handle the various file formats. In the case of jar, war, apk, ear, it is also expected that the consumer know how the layout of the zips vary by format. PURL doesn't provide any of this guidance nor should it. The identification and location is currently what PURL is scoped with.

Another potential impact of having different PURL types for different formats is if you're analyzing PURL using OSS Index or a similar service, you may have to make multiple requests in order to identify it. For example, if I'm using a package, but I compiled it from the source package, I would have to make a special request just for this case instead of making a single request to handle all format types. The potential for false negatives and general confusion will be elevated.

@pombredanne
Copy link
Member

The key question is whether the Source and (Package or Binary) are tightly related or not and if there is ambiguity on how to locate the corresponding (possibly many) files once you know this is a deb package type.

As a reasonably long time Debian user, there is an un-ambiguous relationship between these AFAIK. One Source can yield one or more Binary packages and every Debian control file that provides details about a Binary also list its Source. And the inverse is true: every Package also lists its Source. (This relationship could be made explicit in a database of sorts for some use case that want to keep these relations explicit when using multiple Package URLs to track Debian packages)

And knowing the name/version of a Source or Binary is enough to identify and locate everything about a package (including going from Source to Binary(ies). By everything I mean locate every sources and binaries and control, copyright files and any other system URL that deal with packages.

Based on all this, I am much in favor of @gernot-h proposed changes here and @stevespringett argument that Debian sources are not something that deserves a special type but rather a qualifier. As a side note the same would apply to RPMs and many if not all other package types that have a dual source/binary split personality.

The things to resolve would then be IMHO:

  1. decide if a deb package without qualifier is a source or binary package (I would tend to think this is a binary by default)

  2. decide how to document source vs. binary
    A.- use a type qualifier as suggested in this PR by @gernot-h
    B.- or refine this and define a "standard" qualifier name as suggested by @bureado to reuse across all Package URL types.
    C.- reuse the arch qualifier for source as suggested by @lamby (BTW thank you for chiming in... an honour to have a former DPL here)

Either B or C seem a tad more explicit than the "type" word. And with C. there could be a possible confusion between an arch=noarch (as used in RPMs) and an arch=source.

@gernot-h
Copy link
Contributor Author

gernot-h commented Nov 26, 2019

If I only think about Debian, I would be tempted to agree with @bureado's initial suggestion to introduce a separate type (taken that source and binary handling is quite separated in Debian).

However, @pombredanne reminded me to keep the other distributions in mind - and thinking of all the numerous low and high level packaging formats used in the Linux world (plus PyPI, Ruby Gems, ...), introducing additional types will load the purl spec with unnecessary details over time. I agree with @pombredanne that in the end, source and binary packages are always closely related - and an application which handles artifacts of a distribution (i.e. purl type) will know how to handle its format.

Regarding your final questions:

Re 1.: I think, the default without a qualifier should be binary package for Debian & other binary distributions. This is the thing a "normal user" means when he talks about package X. And probably this is what the purl-spec defines today, so changing this would probably break semantics.

Re 2.: I would love to see a "standard" qualifier to reference source packages for all the binary package formats out there. I'm not really decided regarding arch=source, type=source or classifier=source(s). "arch" sounds wrong because "source" is no "architecture", but guaranteeing mutual exclusiveness between source and binary packages sounds like a good thing to save the world a lot of confusion. At least until the first distribution arrives introducing different source packages for different binary architectures...

That all said, the initial comment of @stevespringett reminded me of a totally different use case, when he wrote that "classifier=sources" "resolves" to the sources for a package. If we have a source package "foo" which creates a binary package "bar", do we want a purl of "pkg:deb/debian/bar...?classifier=source" to "resolve" to the source package foo? I don't think a purl should do this, or?

@gernot-h
Copy link
Contributor Author

@henning-schild, as user of a source distribution and knowing dirty details of a lot of distros, can you have a look at this discussion (see known purl types as background) - and check if we overlooked an important aspect or use case? TIA!

@bureado
Copy link

bureado commented Nov 27, 2019

Re 1.: I think, the default without a qualifier should be binary package for Debian & other binary distributions. This is the thing a "normal user" means when he talks about package X. And probably this is what the purl-spec defines today, so changing this would probably break semantics.

+1, and @pombredanne's B or C lgtm, no strong opinion but watching the discussion.

You both brought up something interesting:

If we have a source package "foo" which creates a binary package "bar", do we want a purl of "pkg:deb/debian/bar...?classifier=source" to "resolve" to the source package foo? I don't think a purl should do this, or?

and

As a reasonably long time Debian user, there is an un-ambiguous relationship between these AFAIK. One Source can yield one or more Binary packages and every Debian control file that provides details about a Binary also list its Source. And the inverse is true: every Package also lists its Source.

I also noticed a distro qualifier for type deb. I'm not sure what a purl user is expecting to do with this. In Debian and derivatives, package:version tuple may or may not be found in the distro repo. The version might have been superseded in a point release, or if a release codename is not used, the package might be gone altogether. The granular way would be to use a timestamp and using an archive mirror to ensure you're resolving to the expected view. I'm just not sure if that would be codified in the distro repo.

That's just an observation which probably has no bearing in this PR but it relates a bit to the question of deriving and locating a source package based on binary package attributes, particularly for .debs that have been built apocryphally (without a source package) or that are offered only as binary downloads in the case of proprietary software.

One last observation, a key operational difference between a src.rpm and a dsc is that a src.rpm can be installed in a running system.

gernot-h added a commit to siemens/purl-spec that referenced this pull request Feb 4, 2020
This allows to refer to Debian source packages. Often there is no
trivial mapping between source and binary packages (several binaries
built from one source, names as well as versions can differ between
binary and source packages!), so package-urls shall allow to explicitely
specify source packages.  Using the arch qualifier for this was
suggested by (former DPL) Chris Lamb and William Bartholomew in
package-url#57 and "avoids people
doing weird things like ?arch=amd64&type=source". To stay consistent
with former versions of the spec, a deb package without qualifier shall
refer to a binary package.

Signed-off-by: Gernot Hillier <[email protected]>
@gernot-h gernot-h dismissed stale reviews from ghost and bufferoverflow via 801cd75 February 4, 2020 13:29
@gernot-h
Copy link
Contributor Author

gernot-h commented Feb 4, 2020

Sorry for the long delay, but as we were not really decided whether to use @pombredanne's B or C, I hoped for some more opinions here. But hey, as we have a suggestion by a former DPL here, let's use that one. :) And I really like the mutual exclusion between arch=amd64 and arch=source as pointed out by @iamwillbar.

So I reworked my commit to use qualifier "arch=source" and tried to incorporate some of our reasoning into the commit message.

Looking forward to your review!

@bureado
Copy link

bureado commented Jun 19, 2021

FWIW, APT 2.0 seems to use src: for source package pinning: https://blog.jak-linux.org/2020/03/07/apt-2.0/

@MarkLodato
Copy link
Contributor

Friendly ping. Is it looking like arch=source is the likely solution? Any chance on making a decision soon?

FWIW, arch=source is what I and my colleagues naturally would have expected, but really any decision is better than no decision.

@gernot-h
Copy link
Contributor Author

gernot-h commented Dec 7, 2021

Same here. I'm happy to update the PR to any decision, be it arch=source which could probably fit other use cases too or using src: which sounds more like a Debian-specific solution.

bureado
bureado previously approved these changes Dec 8, 2021
@bureado
Copy link

bureado commented Dec 8, 2021

This PR lgtm, I'm in favor of merging to unblock more Debian + purl scenarios. The proposal is reasonable and it's not incompatible with future iterations if we learn more from how people are using it.

This allows to refer to Debian source packages. Often there is no
trivial mapping between source and binary packages (several binaries
built from one source, names as well as versions can differ between
binary and source packages!), so package-urls shall allow to explicitely
specify source packages.  Using the arch qualifier for this was
suggested by (former DPL) Chris Lamb and William Bartholomew in
package-url#57 and "avoids people
doing weird things like ?arch=amd64&type=source". To stay consistent
with former versions of the spec, a deb package without qualifier shall
refer to a binary package.

Signed-off-by: Gernot Hillier <[email protected]>
@gernot-h
Copy link
Contributor Author

gernot-h commented Dec 8, 2021

Thanks, @bureado! I just rebased accordingly.

pombredanne
pombredanne previously approved these changes Dec 9, 2021
Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gernot-h Thank you ++ for your patience... This is looking good.
I have pushed a few minor cosmetic refinements for your consideration in siemens#1 on top of your branch.
I can merge as-is with these.

Fix dpkg command syntax and refine definition

Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
@gernot-h
Copy link
Contributor Author

@gernot-h Thank you ++ for your patience... This is looking good. I have pushed a few minor cosmetic refinements for your consideration in siemens#1 on top of your branch. I can merge as-is with these.

Thank you, @pombredanne, for reviewing and improving the wording! My PR is now updated with your fixes.

Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants